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Motivated by Hubert's segmentation procedure [16, 17], we discuss the application of hidden 
Markov models (HMM) to the segmentation of hydrological and enviromental time series. We use a 
HMM algorithm which segments time series of several hundred terms in a few seconds and is compu- 
^vq | tationally feasible for even longer time series. The segmentation algorithm computes the Maximum 

Likelihood segmentation by use of an expectation / maximization iteration. We rigorously prove 
algorithm convergence and use numerical experiments, involving temperature and river discharge 
time series, to show that the algorithm usually converges to the globally optimal segmentation. The 
relation of the proposed algorithm to Hubert's segmentation procedure is also discussed. 
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1 Introduction 



In this paper we discuss the following problem of time series segmentation: given a time series, divide 
^ | it into two or more segments (i.e. blocks of contiguous data) such that each segment is homogeneous, 
but contiguous segments are heterogeneous. Homogeneity / heterogeneity is described in terms of some 
appropriate statistics of the segments. The term change point detection is also used to describe the 
| problem. 

Examples of this problem arise in a wide range of fields, including engineering, computer science, 
biology and econometrics. The segmentation problem is also relevant to hydrology and environmetrics. 
For instance, in climate change studies it is often desirable to test a time series (such as river flow, 
rainfall or temperature records) for one or more sudden changes of its mean value. 
^ ■ The time series segmentation problem has been studied in the hydrological literature. The reported 

approaches can be divided into two categories: sequential and nonsequential. Sequential approaches 
often involve intervention models; see for example [14] and, for a critique of intervention models, [32]. 

Most of the nonsequential time segmentation work appearing in the hydrological literature involves 
two segments. In other words, the goal is to detect the existence and estimate the location of a single 
change point. A classical early study of changes in the flow of Nile appears in [8]. Buishand's work 
[6, 7] is also often cited. For some case studies see [15, 21, 34]. Bayesian approaches have recently 
generated considerable interest [27, 28, 29, 30, 32]. 

It appears that the multiple change point problem has not been studied as extensively. Hubert's 
segmentation procedure [16, 17] is an important step in this direction. The goodness of a segmentation is 
evaluated by the sum squared deviation of the data from the means of their respective segments; in what 
follows we will use the term segmentation cost for this quantity. Given a time series, Hubert's procedure 
computes the minimal cost segmentation with K =2, 3, ... change points. The procedure gradually 
increases K; for every value of K the best segmentation is computed; the procedure is terminated when 
differences in the means of the obtained segments are no longer statistically significant (as measured 
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by Scheffe's contrast criterion [33]). Hubert mentions that this procedure can segment time series with 
several tens of terms but is "... unable at the present state to tackle series of much more than a hundred 
terms ..." because of the combinatorial increase of computational burden [17]. 

The work reported in this paper has been inspired by Hubert's procedure. Our goal is to develop 
an algorithm which can locate multiple change points in hydrological and/or environmental time series 
with several hundred terms or more. To achieve this goal, we adapt some hidden Markov models (HMM) 
algorithms which have originally appeared in the speech recognition literature. (A survey of the relevant 
literature is postponed to Section 3.3.) We introduce a HMM of hydrological and/or enviromental time 
series with change points and describe an approximate Expectation / Maximization (EM) algorithm 
which produces a converging sequence of segmentations. The algorithm also produces a sequence of 
estimates for the HMM parameters. Time series of several hundred points can be segmented in a few 
seconds (see Section 4), hence the algorithm can be used in an interactive manner as an exploratory 
tool. Even for time series of several thousand points the segmentation time is in the order of seconds. 

This paper is organized as follows. In Section 2 we review Hubert's formulation of the time series 
segmentation problem. In Section 3 we formulate the segmentation problem in terms of hidden Markov 
models and present a segmentation algorithm; also we compare the hidden Markov model approach with 
that of Hubert. We present some segmentation experiments in Section 4. In Section 5 we summarize 
our results. Finally, in the Appendix we present an alternative, non-HMM segmentation method, which 
is more accurate but also slower. 



2 Time Series Segmentation as an Optimization Problem 

In this section we formulate time series segmentation as an optimization problem. We follow Hubert's 
presentation, but we modify his notation. 

Given a time series x = (x\, X2, ■ ■■ , xt) and a number K, a segmentation is a sequence of times 
t = (to, ti, ... , tx) which satisfy 

= t < h < ... < t K -i <t K = T. (1) 

The intervals of integers [to + 1, h], [t\ + 1, ...,£2]; ••• , \Pk-i + ^,tx] are the segments; the times to, 
t\, ... , tx are the change points. K, the number of segments, is the order of the segmentation. The 
length of the fc-th segment (for k = 1, 2, K) is denoted by = t& — tfe-i- The following notation is 
used for a given segmentation t = (to, ti, ... , tx)- For k = 1,2, ...,K, define 



Hk = y , d k = 2^ (zt-Mfc) • (2) 

k t=t fc -i+i 

Define the cost of segmentation t = (to, tx) by 

K K t k 

D K (t) = Y,dk = J2 E te-fo) 2 - ( 3 ) 

k=l fc=lt=tfc_l+l 

If Dk has a small value, then the segments are homogeneous, i.e. the x^s are close to jEt^ for k = 
1, 2, K and for t = tk-i + 1, tfc- Now we can define the best K-th order segmentation t to be the 
one minimizing Dx{t) and denote the minimal cost by Dk = Dx(t). Note that for every K we have 
Dr > Dk+i [16]. Also, there is only one segmentation t of order T; in this case every time instant t 
is a segment by itself and Dx(t) = 0. 
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It can be seen [16] that the number of possible segmentations grows exponentially with T. To 
efficiently search the set of all possible segmentations, Hubert uses a branch- and-bound approach. 
Even so, the computational load increases excessively with T and this approach is not able currently 
(in 2000) to segment series of much more than a hundred terms [17]. 

Minimization of Dk can be achieved by several alternative (and faster than branch-and-bound) 
algorithms. A dynamic programming approach is presented in the Appendix to obtain the globally 
minimum cost; this is feasible for T in the order of several hundreds and will be reported in greater 
length in a future publication [20]. In this paper a different approach is followed, which is based on 
HMM's. 

3 Hidden Markov Models 

We now present a HMM formulation of the time series segmentation problem. HMM's have been used 
for runoff modeling [25] and the possibility of using them for hydrological time series segmentation 
has been mentioned in [30] but, as far as the author knows, an actual implementation has not been 
presented yet. On the other hand, we have already mentioned that HMM's are used for segmentation 
of time series in several other fields (see the discussion in Section 3.3). 

The term "hidden Markov model" is used to denote a broad class of stochastic processes; here we use 
a particular and somewhat restricted species of HMM to model a hydrological time series and present 
an approximate Expectation / Maximization (EM) algorithm to perform Maximum Likelihood (ML) 
segmentation. In addition to the standard probabilistic interpretation of the algorithm, a numerical 
optimization point of view is also possible and we use the latter to prove the convergence of the 
algorithm. Finally we discuss related algorithms and possible extensions. 

3.1 HMM's and Hydrological Time Series 

We will use a pair of stochastic processes (Z t ,X t ) to model a hydrological time series with change 
points. We start by considering a simple example. 

The annual flow of a river is denoted by Xf. We assume that, for the years t = 1, 2, ...,t±, Xt is a 
normally distributed random variable with mean [ii and standard deviation a. In year ti a transition 
takes place and, for the years t = t\ + 1, t\ + 2, ...,*2, X t is normally distributed with mean ^ and 
standard deviation a. This process continues with transitions taking place in years t 2 , £3, ••• , tx-i- 
This process is illustrated in Figure 1. We indicate the states of the river flow by circles and the 
possible transitions from state to state by arrows; note that the states are unobservable. We indicate 
the observable time series by the double arrows emanating from the states. 

Figure 1 to appear here 

The above mechanism can be modeled by a pair of stochastic processes (Z t ,X t ) (with t = 0,1, 2, ...) 
defined as follows. 

1. Zf , which is the state process, is a finite state Markov chain with K states; it has initial probability 
vector 7r and transition probability matrix P. Hence, for any T, the joint probability function of 
Zq, Z\, Zt is 

Pr(Zi = Zi, Z 2 = Z 2 , Z T = Z T ) = TT Z0 ■ P Z(hZl ■ P zx ,zi ■ ■■■ ■ Pz T -l,ZTn- ( 4 ) 

For the specific example discussed above, it will also be true that: (a) n± = 1, tt^ = for 
k = 2,3, K, (b) Pkj = for k = 1, 2, K and all j other than k, k + 1. The parameters of 
this process are K and P. 
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2. Xf , which is the observation process, is a sequence of conditionally independent, normally dis- 
tributed random variables with mean fj,z t and standard deviation a. More precisely, for every t, 
the joint probability density of Xi, X 2 , X t conditioned on Z\, Z 2 , Z t is 



fx u x 2 ,...,x t \z 1 ,z 2 ,...M x ^ x 2,---^t\z 1 ,Z2,...,z t ) =JJe ( x « ^)' 2 °\ (5) 

i=i 

The parameters of this process are fi2, ... , and a. We will often use the notation M = 
[m, n 2 , ••• , \ik ]■ 

The (Z t ,X t ) pair is a HMM, in particular a left-to-right continuous HMM [31]. "Left-to-right" 
refers to the structure of state transitions (as depicted in Figure 1) and "continuous" refers to the fact 
that the observation process is continuous valued. The model parameters are K, P, M, a. 

There is a one-to-one correspondence between state sequences z = (z\, Z2, ■ ■■ , zt) and segmentations 
t = (to, ti, ... , tx')- For example, given a particular z, we obtain the corresponding t by locating 
the times tu such that z tk 7^ zt k +i, for k = 1,2,..., K' — 1 (and setting to = and tx' = T). The 
postulated Markov chain only allows left-to-right transitions, hence K' < K, i.e. there will be at most 
K segments, and every segment will be uniquely associated with a state. 

The conditional likelihood of a state sequence z {given an observation sequence x) is denoted by 

l k,t( z \x;P, m ><7) = L KjT (z!, z 2 , zr\xi,x 2 , ...,x T ; P,M,a) (6) 

and the joint likelihood of a state sequence z and an observation sequence x is denoted by 

L 2 KT (z,x;P, M, a) = L 2 KT (z 1 ,z 2 , ...,zr,xi,x 2 , x T ; P, M,cr). (7) 

are understood as functions of z = (z\, z 2 , zt); the observations x = (xi, x 2 , ... , 
xt), the number of segments K, and the length of the time series T, as well as the parameters P, M, 
a are assumed fixed. In place of T any t can be used, to indicate the likelihood of the subsequence 
(zi,z 2 , —,Zt) given (xi,X2, ■•■,#<) etc. For example, we can write 

L \M^ Z ^ -,zt,x 1 ,X2,...,x t ;P,M,a). (8) 

Note also that 

Lr,t = A ■ L K T (9) 
where A is a quantity independent of (z±, z 2 , zt). Finally, from (4), (5) we have 

L^ T (z,x;P,M,a) = [] ■ e<*-»*? , (10) 

where zo = 1, according to the previously stated assumption. 
3.2 The Segmentation Algorithm 

The ML segmentation t can be obtained from the ML state sequence z = (z\, z 2 , ... , zt). Since 
state sequences are unobservable, we will estimate z in terms of the observable sequence x = (xi, X2, 
... , xt) and the parameters K, P, M, a. Note that in practice K, P, M, a will also be unknown. 
Hence the computation of the maximum likelihood HMM segmentation must be divided into two 
subtasks: (a) estimating the HMM parameters and (b) computing the actual segmentation. We follow 
the standard approach used in HMM problems: a parameter estimation phase is followed by a time 
series segmentation phase and the process is repeated until convergence. This is the Expectation / 
Maximization (EM) approach. First we discuss estimation and segmentation in more detail; then we 
will return to a discussion of the EM approach. 
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3.2.1 Parameter Estimation 



Suppose, for the time being, that a segmentation t = (to,ti,..., t m ) is given. A reasonable estimate of 
M = [/xi, fj,2, ■■■ , Hk], dependent on the given segmentation, is (for k = 1,2, ...,-ff) 



(11) 



Similarly we could use the following segmentation- dependent estimates of a (for k = 1,2, ...,-ff) 



r fc -i 



(12) 



However, to maintain compatibility with Hubert's approach, we will use the segmentation-independent 
estimate 



a 



T — 1 



(13) 



where 



Let us now turn to the transition probability matrix P. In a left-to-right HMM, for k = 1,2, X 
and all j different from k and k + 1, we will have Pfcj = 0. Also, for /c = 1, 2, K — 1 we will have 
Pk,k+i = 1 — Pfc.fe- Hence P only has -ftT — 1 free parameters, namely Pi,i, P2,2 5 ••• , Pk-\,k-\- These 
could be estimated from the given segmentation. However, in this paper we use a simpler approach. 
Namely, we assume 



p 1-p ... 
p 1-p ... 


















... p 1 — p 
1 



(14) 



Hence P is determined in terms of a single parameter p, which will be chosen a priori, rather than 
estimated. We have found by numerical experimentation that the exact value of p is not critical; in 
all the examples of Section 4, the segmentation algorithm performs very well using p in the range 
[0.85,0.95]. 

Finally, we must make a choice regarding the number of segments K. We will use Hubert's approach, 
and take a sequence of increasing values: K = 2, 3, ... until a value of K is reached which yields statis- 
tically nonsignificant segmentations (statistical significance is evaluated by Scheffe's contrast criterion, 
[16, 33]). 



3.2.2 Segmentation 

Given observations x = {x\,x%, —,xt) and assuming the parameters K, P, M, a to be known, the 
Maximum Likelihood (ML) state sequence is the z = (z\,Z2, zr) which maximizes L l K r (z|x; P, M,cr) 
as function of z. The ML segmentation t = (to, t±, ... , t^ 1 ) is obtained from z. It will be seen in 
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Section 3.2.4 that, under certain circumstances, z also minimizes the segmentation cost Dk defined 
in Section 2. 

z = {z\,z 2 , ...,Sr) can be found by the Viterbi algorithm [11], a computationally efficient dynamic 
programming approach. In view of (9) we have 

(zi,z 2 , ...,zt) = arg max L 2 K T (zi, z 2 , zt, xi, x 2 , xt; P, M,cj). (15) 

Zi,Z2,...,ZT 

Now, for t = 1, 2, T and fc = 1, 2, K define 

q k ,t= max L 2 Kt (zi, z 2 , zt-i, A;, xi, x 2 , £*; P, M,cr) (16) 

Z\,Z2, — ,Zt—l 

It can be shown by standard dynamic programming arguments [5] that both z = {z-y, z 2 , 3r) an d 
the gfe ; t's of (16) can be computed recursively as follows. 



Viterbi Algorithm 
Input: The time series x\,x 2 , ...,xt ', the parameters K, P, M and a. 
Forward Recursion 

Set qi fi = 1, q 2 ,o = <?3,o = ••• = Qk,o = 0. 
For t = 1,2, ...,T 

For k = 1,2,..., K 

1k,t = ™** K ( qj ,t-i ■ Pj,k ■ I**) 
r M = ar gi max. (q j>t -i ■ P h k ■ e^**-^ ^) . 

End 

End 

Backtracking 

L 2 KT = m.ax!< k < K (qk,r) 
z T = argmax!< fe <^ (g fcjT )- 
For t = T,T — 1, 2 

End 



Upon completion of the forward recursion, L 2 KT , the maximum value of L 2 KT , is obtained. The 
backtracking phase produces the state sequence which maximizes L 2 KT (and hence also L l KT ). Exe- 
cution time is of order 0(T • K 2 ) which is linear (rather than exponential) in the length of the time 
series T. This makes the algorithm computationally feasible even for long time series. For more details 
on the Viterbi algorithm see [11]. 
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3.2.3 Combined Parameter Estimation and Segmenation 

Parameter estimation and segmentation can be combined in an algorithm which maximizes the like- 
lihood viewed as a function of both the state sequence z = (zi,Z2,—, zr) and the parameters M. 
The algorithm presented below is an iterative Expectation / Maximization (EM) algorithm [9] which 
produces a converging sequence of segmentations. 



HMM Segmentation Algorithm 
Input: The time series x = (xi,X2, ...,ir) ; the parameters K, P; a termination variable e. 
Choose randomly a state sequence z^ ) = (z^\ zf\..., z^). 
Compute a from (13). 
For i = 1,2,... 

Compute from z^ _1 ). 
Compute MW from tW and (11). 

Compute zW by the Viterbi algorithm using x, K, P, MW and a. 
If |L^ T (zW,x;P, MW,5) - L^ T (z( i-1 ),x; P, M^ _1 ) ,5)1 < e. 

z = zW. 
Exit the loop 
Endlf 

End 



In Section 3.2.4 we will show that the above algorithm is a very close approximation to an EM 
algorithm and that, under certain conditions, every iteration increases the likelihood function. In all 
the examples presented in Section 4 the algorithm converges to the global maximum with very few 
iterations (typically 3 or 4). In other words, the outer loop of the algorithm is executed only a few 
times; in each execution we perform a parameter reestimation according to (11) (with execution time 
O(T)) and a segmentation by the Viterbi algorithm (with execution time 0(T • K 2 )). Hence the total 
execution time for a fixed K value is 0(T • K 2 ). 

For a complete segmentation procedure the above algorithm is run for a sequence of increasing 
values K = 2,3, ... . First the algorithm is used to obtain the ML segmentation of order K =2; the 
difference of the means of the two segments is tested for statistical significance by the Scheffe criterion 
(for details see [16] and [33]). If the difference is not significant, then it is concluded that the entire 
time series consists of a single segment. If the difference is significant, the algorithm is run with K = 3 
and the Scheffe test is applied to the resulting segments. The process is continued until, for some value 
of K, a segmentation is obtained which fails the Scheffe test (or until we reach K = T, an unlikely 
case) . 

The use of Scheffe's contrast criterion to determine the true value of K is somewhat problematic. 
This point is discussed in some detail in [16]. Many methods for the determination of K have been 
proposed in the literature, but none of these completely resolves the problem. In cases of doubt, a 
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pragmatic approach would be to use human judgement to evaluate segmentations with different K's. In 
the case of hydrological and environmental time series which involve a rather small number of segments, 
this is relatively easy. The short execution time of the segmentation algorithm favors this approach, 
since experimentation in an "interactive" mode is feasible. 

3.2.4 Convergence 

The goal of this section is to show that, for a fixed K, every iteration of the HMM segmentation 
algorithm increases the likelihood; since the likelihood is bounded above by one, this also implies that 
the algorithm converges. 

Two approaches can be used. The first approach is based upon the probabilistic interpretation of 
the algorithm; since this is a routinely applied analysis of EM algorithms, it will be presented only in 
outline. In the second approach, the segmentation algorithm is viewed from a numerical optimization 
point of view and convergence is proved without using any probabilistic assumptions; furthermore this 
approach shows clearly the connection of our segmentation algorithm to Hubert's procedure. 
Probabilistic Approach. As explained in [9], the basic ingredient of the EM family of algorithms is 
the iterative application of an expectation step followed by a likelihood maximization step. In our case 
the expectation step consists in estimating by (11) and the maximization step consists in finding 
zW by the Viterbi algorithm. 

While the Viterbi algorithm computes exactly the global maximum of the likelihood (viewed as 
a function of z only!), the estimation step used in this paper is approximate. The exact step would 
involve computing estimates of jui, JI2, jix for every possible segmentation and then combining these 
estimates in a sum weighted by the respective probability of each segmentation (a similar approach 
should be used for a, using the estimates of (12)). This approach is used in [10] and elsewhere; while it 
is computationally more expensive than the approach used here, it is still viable. At any rate, in most 
cases the two approaches yield very similar results. 

If it is assumed that the estimate of (11) is a close approximation to the maximum likelihood 
estimate of M, then convergence can be established by a standard EM argument presented in [9, 24] 
and several other places. This argument shows that a certain cross entropy Q(zW, z^ -1 )) is decreased 
by every iteration of an EM algorithm. Since Q is always nonnegative, it must converge to a nonnegative 
number, and this suffices for the algorithm to terminate. Furthermore, by relating Q(x^\ z^ -1 ^) to the 
likelihood, it can be shown that the sequence Lk,t( z ^) is monotonically increasing. 
Numerical Approach. In what follows we will consider K, P, x, a to be fixed. We will denote the 
set of all possible state sequences by and the set of all state sequences with K transitions by $x; we 
will also use the standard notation H K for the set of all K-dimensional real vectors. 

Taking the negative logarithm of (10) we obtain 

- log [L^ T (z, x; P, M,a)] = - ^ log (P Zt _ uZt ) + £ (a * . (17) 

t=i t=i 

We define 4>(z) = "number of times zt-i 7^ zt"; in other words, 0(z) is the number of transitions in the 
state sequence z. If we limit ourselves to state sequences z € &k, then obviously <fi(z) = K. Now, for 
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all z G (17) becomes 

T , ,2 

- log [L% T (z, x; P, M,<r)] = — ({T — <f>(z)) ■ log (p) + 0(z) ■ log (1 - p)) + £ ^_^L (i 8 ) 

t=i a 

= -(( r _^ ) .i og(p) + K. 1 og(i-p)) + ^ (a:t ~^ ) => (19) 

t=\ ° 

= C(T,K,P)+f2 iXt ~ t t Zt)2 - (20) 
where C(T, if, P) = - [(T - K) ■ log (p) + K ■ log (1 - p)]. Now we define the function 

T 

J(z,M) = ^(x t - / u Zi ) 2 (21) 
t=i 

and note that 

J(z, M) = -2a 2 • (log [L 2 K:T (z, x; P, M,cr)] + C(T, K, P)) . (22) 

Note that, for simplicity of notation, we write J(z, M) as a function only of z, M; the quantities T, if, 
P, x, <7 can be considered fixed. 

Now consider a run of the segmentation algorithm which produces a sequence z(°), z^\ z^> ... , 
zW, ... . Suppose that for every s we have zW g= <£^-. By the reestimation formula for MW we will 
have for every s: 

VMeR^: J(z^ 1 );MW) < J(z^ 1 ) ;M). (23) 

Furthermore, note that the Viterbi algorithm yields the global maximum of the likelihood as a function 
of z. Hence, from (22) and the reestimation formula for zW we will have for every i: 

Vz £ $ K : J(z«;M^) < J(z;M«). (24) 

Now, using first (24) and then (23), we obtain 

J(z« ; MW ) < J{z {i -^ ; M« ) < J{z^ ; M^ 1 ) ) (25) 

and, from (25) and (22), 

L 2 KtT (z® , x; P, MW ,<t) > ^(z^- 1 ) , x; P, M^ 1 ) ,ct) (26) 



Hence, i//or every i we have zW G then the sequence ^L 2 K T (z^\ x; P, MW,ct)| 



oo 

is increas- 

i=0 

ing; since it is also bounded from above by one, it must converge. It follows that the HMM segmentation 
algorithm produces a sequence of segmentations with increasing and convergent likelihood; from con- 
vergence of the likelihood we also conclude that the algorithm will eventually terminate. Furthermore, 
if is the segmentation obtained from zW is easy to check that 

D K (t®) = J(zW;MW). (27) 

From (23), (27) follows that Hubert's segmentation cost is decreased in every iteration of the HMM 
segmentation algorithm. 
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For the above analysis to hold, we have required that 6 &k for every i. This condition is easy 
to check; it is usually satisfied in practice; and it can be enforced by choosing the parameter p to be 
not too close to 1 (if p ~ 1, then the cost of state transitions is very high and transitions are avoided). 

One way to interpret the above analysis is the following: using an appropriate value of p, the 
segmentation algorithm presented here becomes an iterative, approximate way to find Hubert's optimal 
segmentation. The approximation is usually very good, as will be seen in Section 4. This interpretation 
is completely nonprobabilistic and does not depend on the use of the hidden Markov model. 
Computational Issues. We must also mention that succesful implementation of the Viterbi algorithm 
requires a normalization of the qk,t' s to avoid numerical underflow; alternatively one can work with the 
logarithms of the the qk/s and perform additions rather than multiplications. 

3.3 Discussion and Extensions 

An extensive mathematical, statistical and engineering literature covers both the theoretical and applied 
aspects of HMM's. The reader can use [10, 31] as starting points for a broader overview of the subject. 
EM-like algorithms for HMM's were introduced in [4, 3, 2, 24]. The EM family of algorithms was 
introduced in great generality in [9]; work on HMM's also appears in the econometrics [13, 23], as well 
as in the biological [22] literature. These references are merely starting points; the literature is very 
extensive. 

As already mentioned, the EM segmentation algorithm used here is a variation of algorithms which 
are well-established in the field of speech recognition; for example see [18, 19]. Taking into account the 
extensive HMM literature, as well as various ideas reported in the hydrological literature, the algorithm 
of Section 3.2.4 can be extended in several directions. 

1. The assumption that the observations are normally distributed is not essential. Other forms of 
probability density can be used in (10). Similarly, by a simple modification of (10) the algorithm 
can handle vector valued observations. 

2. A basic idea of the algorithm is that each segment must be homogeneous. Assuming that the 
observations within a segment are generated independently and normally, segment homogeneity 
is evaluated by the deviation of xt k _ 1 +i,x tk _ 1+2 , ■ ■■,xt k from the segment mean ju^. But alterna- 
tive assumptions can be used. For example, assume that the observations are generated by an 
autoreggressive mechanism, i.e. that, for t = tk-i + 1, tk-\ + 2, tk and k = 1, 2, K, we have 

xt = ao,fc + ai,kXt-i + a 2 ,kX t -2 + ••• + a hk x t -i + e t (28) 

(where et is a white noise term). The segmentation algortithm can be used within this framework. 
In this case the reestimation phase computes the AR coefficients a\^, 02,fc> ••• , ai,k, which can 
be estimated from xt k _ 1 +i, Xt k _ 1+2 , ... , xt k using a least squares fitting algorithm. This approach 
is used in Section 4.3 to fit a HMM autoregressive model to global temperature data. 

3. Similarly, it may be assumed that the observations are generated by a polynomial regression of 
the form (for t = tk-i + 1, ifc-i + 2, tk and k = 1, 2, K) 

xt = ao.fc + ai,k ■ {t ~ tk-i) + ••• + o«,fc • {t ~ tk-i) 1 + e t (29) 

where et is a noise term. Again, the coefficients ao,fc, ai,k, ■■■ , o-i,k ca n be computed at every 
reestimation phase by a least squares fitting algorithm. Additional constraints can be used to 
enforce continuity across segments. In the case of 1st order polynomials there are only two coeffi- 
cients, ao,fc, ai,k, which are determined by the continuity assumptions; the iterative reestimation 
of the change points can still be performed. This case may be of interest for detection of trends. 
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4. It has been mentioned in Section 3.2.1 that P can also be reestimated in every iteration of the 
EM algorithm. Preserving the left-to-right structure implies that for k = 1,2,..., K and for all 
j different from k and k + 1, we have Pkj = 0; furthermore, for k = 1,2,..., K — 1 we have 
Pk,k+i = 1 ~~ Pk,k- The Pk t k parameters can be estimated by = . However, some 
preliminary experiments indicate that this approach does not yield improved segmentations. 

5. On the other hand, the treatment of the state transition can be modified in a more substantial 
manner by dropping the left-to-right assumption. In the current model each state of the Markov 
chain corresponds to a single segment and, because of the left-to-right structure, it is visited at 
most once. An alternate approach would be to assign some physical significance to the states. 
For instance, states could be chosen to correspond to climate regimes such as "dry", "wet" etc. 
In this case a state could be visited more than once. This approach allows the choice of models 
which incorporate expert knowledge about the evolution of climate regimes. On the other hand, 
if the left-to-right structure is dropped, the number of free parameters in the P matrix increases. 
These parameters could be estimated (conditional on a particular state sequence) by 

p no. of times that z t = k and zt+i = j 
^ no. of times that zt = k 

The enhancements of arbitrary transition structure and transition probability estimation are 
easily accommodated by our algorithm. 



4 Experiments 

In this section we evaluate the segmentation algorithm by numerical experiments. The first experiment 
involves an annual river discharge time series which contains 86 points. The second example involves 
the reconstructed annual mean global temperature time series and contains 282 points. Both of these 
examples involve segmentation by minimization of total deviation from segment means. The third 
example again involves the annual mean global temperature time series, but performs segmentation 
by minimization of autoregressive prediction error. The fourth example involves artificially generated 
time series with up to 1500 points. 



4.1 Annual Discharge of the Senegal River 

In this experiment we use the time series of the Senegal river annual discharge data, measured at the 
Bakel station for the years 1903-1988. The length of the time series is 86. The same data set has 
been used by Hubert [16, 17]. The goal is to find the segmentation which is optimal with respect to 
total deviation from the segment means, has the highest possible order and is statistically significant 
according to Scheffe's criterion. 

We run the segmentation algorithm for increasing values of K. In the experiments reported here 
we have always used p = 0.9 (similar results are obtained for other values of p in the interval [0.85, 
0.95]. For every value of K, convergence is achieved by the 3rd or 4th iteration of the algorithm. The 
optimal segmentations are presented in Table 1. The segmentations which were validated by the Scheffe 
criterion appear in bold letters. 
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K 


Segment Boundaries (Change Points) 


1 


1902 


1988 












2 


1902 


1967 


1988 










3 


1902 


1949 


1967 


1988 








4 


1902 


1917 


1953 


1967 


1988 






5 


1902 


1921 


1936 


1949 


1967 


1988 




6 


1902 


1921 


1936 


1949 


1967 


1971 


1988 



Table 1 



Hence it can be seen that the optimal and statistically significant segmentation is that of order 5, 
i.e. the segments are [1903,1921], [1922,1936], [1937,1949], [1950,1967], [1967,1988]. That this is the 
globally optimal segmentation, has been shown by Hubert in [16, 17] using his exact segmentation 
procedure. A plot of the time series, indicating the 5 segments and the respective means appears in 
Figure 2. 

Figure 2 to appear here 

We have verified that the HMM algorithm finds the globally optimal segmentation for all values of 
K (as listed in Table 1). We performed this verification by use of the exact dynamic programming algo- 
rithm presented in the Appendix. The conclusion is that, in this experiment, the HMM segmentation 
algorithm finds the optimal segmentations considerably faster than the exact algorithm. Specifically, 
running the entire experiment (i.e. obtaining the HMM segmentations of all orders) with a MATLAB 
implementation of the HMM segmentation algorithm took 1.1 sec on a Pentium III 1 GHz personal 
computer; we expect that a FORTRAN or C implementation would take about 10% to 20% of this 
time. 

4.2 Annual Mean Global Temperature 

In this experiment we use the time series of annual mean global temperature for the years 1700 - 1981. 
Only the temperatures for the period 1902 - 1981 come from actual measurements; the remaining 
temperatures were reconstructed according to a procedure described in [26] and also at the Internet 
address http://www.ngdc.noaa.gov/paleo/ei/eiJ.ntro.html. The length of the time series is 282. 
The goal is again to find the segmentation which is optimal with respect to total deviation from the 
segment-means, has the highest possible order and is statistically significant according to Scheffe's 
criterion. 

We run the segmentation algorithm for K = 2,3, ...,6, using p = 0.9. Convergence takes place in 4 
iterations or less. The optimal segmentations are presented in Table 2. The segmentations which were 
validated by Scheffe's criterion appear in bold letters. 



K 


Segment Boundaries (Change 


Points) 






1 


1700 


1981 












2 


1700 


1930 


1981 










3 


1700 


1812 


1930 


1981 








4 


1700 


1720 


1812 


1930 


1981 






5 


1700 


1720 


1812 


1926 


1935 


1981 




6 


1700 


1720 


1812 


1926 


1934 


1977 


1981 


Tal 


ble 2 
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Hence it can be seen that the optimal and statistically significant segmentation is of order 4, i.e. 
the segments are [1700,1720], [1721,1812], [1813,1930], [1931,1981]. A plot of the time series, indicating 
the 4 segments and the respective means appears in Figure 3. 

Figure 3 to appear here 

The total execution time for the experiment (i.e. to obtain optimal segmentations of all orders) is 
2.97 sec. The segmentations of Table 2 are the globally optimal ones, as we have verified using the 
dynamic programming segmentation algorithm. 

4.3 Annual Mean Global Temperature with AR model 

In this experiment we again use the annual mean global temperature time series, but now we assume 
that it is generated by a switching regression HMM. Specifically, we assume a model of the form 

xt = ao,fc + ai,kx t -i + a 2 ,kX t -2 + a 3jk Xt-3 + e* (31) 

where the parameters ao k , a\ki 0,2 hi a 3,fc are specific to the A;-th state of the underlying Markovian 
process. Given a particular segmentation, these parameters can be estimated by a least squares fitting 
algorithm. Hence the segmentation algorithm can be modified to obtain the optimal segmentation with 
respect to the model of (31). 

Once again we run the segmentation algorithm for K = 2,3, ...,6, using p = 0.9. The optimal 
segmentations thus obtained are presented in Table 3. 



K 


Segment Boundaries (Change 


Points) 






1 


1700 


1981 












2 


1700 


1926 


1981 










3 


1700 


1833 


1926 


1981 








4 


1700 


1769 


1833 


1926 


1981 






5 


1700 


1769 


1833 


1895 


1926 


1981 




6 


1700 


1769 


1825 


1877 


1904 


1926 


1981 


Tal 


ble 3 



In this case segment validation is not performed by the Scheffe criterion; instead we use a prediction 
error correlation criterion. This indicates the maximum statistically significant number of segments is 
K=4 and the segments are [1700,1769], [1770,1833], [1834,1926], [1927,1981]. A plot of the time series, 
indicating the 4 segments and the respective autoregressions appears in Figure 3. 

Figure 4 to appear here 

Recall that the segments obtained by means-based segmentation are [1700,1720], [1721, 1812], 
[1813, 1930], [1931, 1981]. This seems to be in reasonable agreement with the AR-based segmentation, 
excepting the discrepancy of 1720 and 1769. From a numerical point of view, there is no a priori reason 
to expect that the AR-based segmentation and means-based segmentation should give the same results. 
The fact that the two segmentations are in reasonable agreement, supports the hypothesis that actual 
climate changes have occurred approximately at the transition times indicated by both segmentation 
methods. 

Finally, let us note that the total execution time for the experiment (i.e. to obtain optimal segmen- 
tations of every order) is 3.07 sec and that the segmentations of Table 3 are the globally optimal ones, 
as we have verified using the dynamic programming segmentation algorithm. 
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4.4 Artificial Time Series 



The goal of the final experiment is to investigate the scaling properties of the algorithm, specifically 
the scaling of execution time with respect to time series length T and the scaling of accuracy with 
respect to noise in the observations. To obtain better control over these factors, artificial time series 
are used, which have been generated by the following mechanism. 

The time series are generated by a 5-th order HMM. Every time series is generated by running the 
HMM from state no.l until state no. 5. Hence, every time series involves 5 state transitions and, for the 
purposes of this experiment, this is assumed to be known a priori. On the other hand, it can be seen 
that the length of the time series is variable. With a slight change of notation, in this section T will 
denote the expected length of the time series, which can be controlled by choice of the probability p. 
The values of p were chosen to generate time series of average lengths 200, 250, 500, 750, 1000, 1250, 
1500. 

The observations are generated by a normal distribution with mean {k= 1, 2, 5) and standard 
deviation a. In all experiments the values fi\= ^3= ^5= 1, [12= 114= —1 were used. Several values of 
a were used, namely a= 0.00, 0.10, 0.20, 0.30, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00. 

For each combination of T and a, 20 time series were generated and the HMM segmentation algo- 
rithm was run on each one. For each run two quantities were computed: c, accuracy of segmentation, 
and T e , execution time. Segmentation accuracy is computed by the formula 

„ _ E^i = *t) 
T 

where the indicator function l(z t = %) is equal to 1 when zt = % and equal to otherwise. 

From these data two tables are compiled. Table 4 lists T e (in seconds) as a function of T (i.e. T e 
is averaged over all time series of the same T). Table 5 lists average segmentation accuracy c as a 
function of T and a (i.e. c is averaged over the 20 time series with the same T and a). As expected, 
segmentation accuracy is generally a decreasing function of a. 



T 


200 


250 


500 


750 


1000 


1250 


1500 




0.193 


0.249 


0.585 


1.024 


1.845 


3.026 


4.60 



Table 4. Average execution time T e (in seconds) as a function of average time series length T. 



T 


200 


250 


500 


750 


1000 


1250 


1500 


a 


c 


0.00 


1.0000 


1.0000 


1.0000 


0.9692 


1.0000 


1.0000 


0.9902 


0.10 


1.0000 


1.0000 


1.0000 


0.9814 


1.0000 


1.0000 


1.0000 


0.20 


1.0000 


0.9806 


1.0000 


1.0000 


1.0000 


0.9716 


1.0000 


0.30 


1.0000 


1.0000 


0.9999 


0.9792 


1.0000 


0.9807 


1.0000 


0.50 


0.9989 


0.9993 


0.9994 


0.9997 


1.0000 


0.9997 


1.0000 


0.75 


0.9945 


0.9979 


0.9663 


0.9521 


0.9988 


0.9992 


0.9991 


1.00 


0.9881 


0.9880 


0.9863 


0.9974 


0.9517 


0.9981 


0.9711 


1.25 


0.9778 


0.9710 


0.9762 


0.9924 


0.9965 


0.9843 


0.9781 


1.50 


0.9561 


0.9701 


0.9874 


0.9341 


0.9507 


0.9362 


0.9956 


1.75 


0.9337 


0.8985 


0.9494 


0.9341 


0.9708 


0.9272 


0.9942 


2.00 


0.8628 


0.8617 


0.8255 


0.9141 


0.8600 


0.9523 


0.8297 



Table 5. Average classif. accuracy c as a function of average time series length T and noise level a. 
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5 Conclusion 



In this paper we have used hidden Markov models to represent hydrological and enviromental time 
series with multiple change points. Inspired by Hubert's pioneering work and by methods of speech 
recognition, we have presented a fast iterative segmentation algorithm which belongs to the EM fam- 
ily. The quality of a particular segmentation is evaluated by the deviation from segment means, but 
extensions involving autoregressive HMM's, trend-generating HMM's etc. can also be used. Because 
execution time is 0(T • K 2 ), our algorithm can be used to explore various possible segmentations in 
an interactive manner. We have presented a convergence analysis which shows that under appropri- 
ate conditions every iteration of our algorithm increases the likelihood of the resulting segmentation. 
Furthermore, numerical experiments (involving river flow and global temperature time series) indicate 
that the algorithm can be expected to converge to the globally optimal segmentation. 

A Appendix: A Dynamic Programming Segmentation Algorithm 

In this appendix we present an alternative time series segmentation algorithm which, unlike the HMM 
algorithm, is guaranteed to produce the globally optimal segmentation of a time series. This superior 
performance, however, is obtained at the price of longer execution time. Still, the algorithm is com- 
putationally viable for time series of several hundred terms. We describe the algorithm briefly here; a 
more detailed report appears in [20]. 

A.l A General Segmentation Cost 

A generalization of the time series segmentation problem discussed in previous sections is the following. 
Given a time series x = (xi, x 2 , ■ ■■ , xt) and a fixed K, find a sequence of times t = (to, h, ... , tx) 
which satisfies = to < h < ■■■ < tx-i < tx = T, and minimizes 

K 

J K (t) = J2fk(tk-i,t k ;x)- (32) 

k=i 

Jx(t) consists of a sum of terms fk(tk-i, tk', x). For example, Hubert's cost function can be obtained 
by setting 

MM;x)= £ f*T- E 71 + f T ) • (33) 

T = S + l V S J 

Hence Hubert's segmentation cost (3) is a special case of (32). 
Similarly, consider autoregressive models of the form 

x t = u t A k + e t , (34) 

where t = i fc _i + 1, t fc _i + 2, ... , t k ) and u t = [1, x t -i, x t - 2 , x t -{\, A k = [a kjl , a fcj2 , a ki i]' (the ' 
denotes transpose of a matrix). Then we can set 

t 

f K (s,t;x) = ^2 {x T -u T A k f. (35) 

T = S + l 
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Then the segmentation cost becomes 

J K (t) = Y,e 2 T = Y, £ (xt-u t A k f. (36) 

T=S k=l t=tfc_l+l 

The 0^,1, a^, (elements of A&) are unknown, but can be determined by least squares fitting 

on £ tfc _ 1+ i, Xt k _ l +2, ••• , Xt k . A similar formulation can be used for regressive models of the form 
x t = u t A k + e t where A k = [a kfi , a k>1 , a k j]', u t = [1, (t - i fe _i), (t - t fe _i) 2 , {t - i fc _i) f ]. Hence 
we see that (32) is sufficiently general to subsume many cost functions of practical interest. 

A. 2 Dynamic Programming Segmentation Algorithm 

The following dynamic programming algorithm can be used to minimize (32); it has been presented in 
[1] and applies to very general versions of the time series segmentation problem. 



Dynamic Programming Segmentation Algorithm 

Input: The time series x = (xi, X2, #t); a termination number K. 

Initialization 

For t = 1,2, ...,T 

For s = 1,2,..., t 

d s ,t = Ik{s - M;x) 

End 

ct,o = d ltt 

End 

Minimization 

For k = 1,2,..., K 

For i = k,k + l,...,T 
For s = 0,1,...,*- 1 

e s = c S)k -i + d s+ i :t 

End 

= mini< s <t (e s ) 
zt,k = argmini< s <t (e s ) 

End 

End 

Backtracking 

For k = 1,2,..., K 
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tk,k — T 

For n = k — 1, k — 2, 1 



t n ,k ~ Z t n+hk ,n 

End 



to,k = 



End 



On termination, the dynamic programming segmentation algorithm has computed 



CT,k = 



min Jfe(t) 

t=(*o,*l, ■■■,tk) 



(37) 



for k = 1,2, in other words it has recursively solved a sequence of minimization problems. For 

k = 1,2, ...jK, the optimal segmentation = (to,fc) ii,fc> ••• > £fc,fc) has been obtained by backtracking. 

The recursive minimization is performed in the second part of the algorithm; it is seen that compu- 
tation time is 0(K ■ T 2 ). This is not as good as the 0(K 2 ■ T) obtained by the HMM algorithm (note 
that usually K is significantly less than T), but is still computationally viable for T in the order of a 
few hundreds. The backtracking part of the algorithm has execution time 0(K 2 ). 

However, in many cases the computationally most expensive part of the algorithm is the initializa- 
tion phase, i.e. the computation of d Sj t- This involves 0(T 2 ) computations of d s j = fx(s — l,t;x) and 
can increase the computation cost by one or more orders of magnitude. For example, if we apply the 
algorithm to detect changes in the mean, then 



which involves t — s + 1 addittions; if (38) is used in the initialization phase, then this phase requires 
0(T 3 ) computations and this severely limits computational viability to relatively short time series. 

Hence, to enhance the computational viability of the dynamic programming segmentation algo- 
rithm, it is necessary to find efficient ways to perform the initialization phase. In the next two sections, 
we will deal with this question for two specific forms of /^(s,t;x): the first form pertains to the 
computation of means and the second to the computation of regressions and autor egressions. 

A. 3 Fast Computation of Means 

The computation of means can be performed recursively, as will now be shown. For t = 1,2, ...,T, 
s = 1, 2, t — 1, we must compute 




(38) 




(39) 



For t 



= 1, 2, T, s = 1, 2, t, define the following additional quantities: 



Ps,t = 



yt 1 » 

/-^T=S 



Qs,t — Ps+l,t — Vs,t- 



(40) 
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Then we have 

t t 



ds,t = ^2{x T - p Stt ) 2 = (X s - p S: t) 2 + ^ (X T - p s ,t) 

T=S T = S+1 



and 

t 



T=s+1 r=s+l 
t 



=s+l 

t t t 

^2 ( x r - p s +i,t) 2 + (p*+m ~ ps ^ 2 + 2 ' ( Xt ~ Ps+i,t)(ps+i,t - p ; 

T = S + l T=S + 1 T = S + 1 

d s +i,t + (t-s)- (q s , t ) 2 + 2 • (p s +i,t ~ Ps,t) • ^2 Xr ~ ~ s )Ps+i,t =** 



\r=s+l 

t 

^2 {x T ~ Ps,t) 2 = d s+1:t + {t-s) ■ {q s , t ) 2 

T=S + 1 

From (41), (42) follows that (for t = 1, 2, T, s = 1, 2, i - 1) 

d Sjt = d s +i, t + (i - s) ■ {q s ,t) 2 + (x s - p s ,t) 2 - 
The above computations can be implemented in time 0(T 2 ) by the following algorithm. 

Recursive Computation of d St t 

For t = 1,2, ...,T 

M t , t = x t 
Pt,t = M tjt 

For s = t-l,t-2, ...,1 



M a , t = x s + M i+ i, t 

End 



P S J ~ t-s+1 



End 

For t = 1,2, ...,T 

For s = 1,2,.., t - 1 

Qs,t = (Ps+l,t - Ps,t) 

End 

End 

For t = 1,2,...,T 
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d t ,t = 

For s = t - 1, t— , 2, 1 



4,i = d s+ltt + (t- s) ■ (q S! t) + (x s - p s ,t) 2 - 



End 



End 



Hence, if the above code replaces the initialization phase of the dynamic programming algorithm 
in Section A. 2, we obtain an 0(K ■ T 2 ) implementation of the entire algorithm. In other words, we 
obtain an algorithm which, given a time series of length T, computes the global minimum of Hubert's 
segmentation cost (for all segmentations of orders K = 1,2, 3, T) in time 0(K ■ T 2 ) 

A. 4 Fast Computation of Regression Coefficients 

Consider now autoregressive models described by (34). As already mentioned, in this case we have 



/fc(ifc-i,£fc;x) = ^2 (x t -u t A k y 
t=t fc _i+i 



(44) 



Hence d S:t = fk{s - 1, t; x) is given by 



d s ,t = ^2 ( Xt ~ u t A ( s > t)) 2 ■ 



(45) 



where ut = [1, xt-i, xt-2, ■■■ , Xt-i] and A(s,t) is obtained by solving the least squares equation 

A(s, t) = (U{s, t)' ■ U{s, t)) _1 • U{s, t)' ■ X(s, t) (46) 

with 



X(s,t) = 



X s +1 



and U(s,t) = 



u s 

Us+l 



(47) 



Note that to solve (46) the matrix multiplications U (s, t)' ■ U(s, t), U(s, t)' ■ X(s, t) must be performed. 
For t = 1,2, ...,T, s = 1,2, ...,t, these multiplications require 0(T 5 ) time. However, the solution of 
(46) can be approximated by a fast recursive algorithm reported in [12]. Choose some small number 
5 and set 

1 



Po = yI 



(48) 



(where / is the (Z + 1) x (/ + 1) unit matrix). Then, consider the following recursion for s = 1, 2, T 
and t = s + 1, ...,T: 



U t = [l,X t -l,X t -2, -,x t -i], 
n = t — s, 

Pn = Pn-l - Pn-l ■ u' t ■ Uf P n -1 m 



l + Uf P n -l ■ u[ ' 



A(s, t) = A(s, t - 1) + P n ■ u' t ■ (xt - u t ■ A{s, t - 1)) . 



(49) 
(50) 

(51) 
(52) 
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Using the arguments of [12] for a fixed s and increasing t it can be shown that A(s, t) converges 
very quickly to A(s,t), the true solution of (46). Furthermore, the computations of (49)-(52) can be 
implemented in time 0(T 2 ). Hence, for the case of autoregressive models, the d s j computation can be 
programmed as follows. 



Recursive Computation of d s >t 

For s = 1,2, ...,T 

Initialize A(s,s) randomly 
d s , s =0 

For t = s + l,s + 2, ...,T 

Ut = [l,X t -l,X t -2, -,x t -i\ 
n = t — s 

Pn = Pn-1 ~ Pn-1 ■ u' t ■ U t ■ P n -1 ' l+„ t .p n _ 1 . u ; 

A(s, t) = A(s, t - 1) + P n ■ u' t ■ (x t - u t ■ A(s, t - 1)) 
d Syt = d Sjt -i + (x t - u t ■ A(s, t)^j 

End 

End 

Hence, if the above code replaces the initialization phase of the dynamic programming segmentation 
algorithm in Section A. 2, we have an 0(K-T 2 ) implementation of the entire algorithm for autoregressive 
models. A similar modification is possible for regressive models of the form (34). 



References 

[1] I.E. Auger and C.E. Lawrence. "Algorithms for the optimal identification of segment neighbor- 
hoods". Bui. of Math. Biol, vol.51, pp.39-54, 1989. 

[2] L.E. Baum and T.Petrie. "Statistical inference for probabilistic functions of finite state Markov 
chains". Ann. of Math. Stat, 1966, vol.37, pp.1554-1563. 

[3] L.E. Baum and J. A. Eagon. "An inequality with applications to statistical estimation for proba- 
bilistic functions of Markov processes and to a model for ecology". Bull. Amer. Math. Soc, vol.73, 
pp.360-363, 1967. 

[4] E. Baum et al. "A maximization technique occurring in the statistical analysis of probabilistic 
functions of Markov chains". Ann. of Math. Stat., vol.41, pp. 164-171, 1970. 

[5] D. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, Engle- 
wood Cliffs, New Jersey, 1987. 



20 



[6] T.A. Buishand. "Some methods for testing the homogeneity of rainfall records". J. HydroL, vol.58, 
pp.11-27, 1982. 

[7] T.A. Buishand. "Tests for detecting a shift in the mean of hydrological time series". J. HydroL, 
vol.75, pp.51-69, 1984. 

[8] G. W. Cobb. "The problem of the Nile: Conditional solution to a changepoint problem". Biometrika, 
vol. 65, pp. 243- 252, 1978. 

[9] A. P. Dempster, N. M. Laird and D. B. Rubin. "Maximum likelihood from incomplete data via the 
EM algorithm". J. Roy. Statist. Soc. B, vol.39, pp. 1-38, 1977. 

[10] R.J. Elliot, L. Aggoun and J.B. Moore. Hidden Markov Models. Springer, New York, 1995. 

[11] G. Forney. "The Viterbi algorithm". Proceedings of the IEEE, vol. 61, pp. 268-278, 1973. 

[12] D. Graupe. Identification of systems. Van Nostrand, Reinhold, New York, 1972. 

[13] J.D. Hamilton. "Analysis of time series subject to changes in regime." J. of Econometrics, vol.45, 
pp.39-70, 1990. 

[14] A.I. Hipel and K.W. McLeod. Time Series Modelling of Water Resources and Environmental 
Systems. Elsevier, 1994. 

[15] H. Hoppe and G. Kiely. "Precipitation over Ireland - Observed changes since 1940". Phys. Chem. 
Earth (B), vol.24, pp.91-96, 1999. 

[16] P. Hubert. "Change points in meteorological analysis". In Applications of Time Series Analysis 
in Astronomy and Meteorology, T.Subba Rao, M.B. Priestley and O. Lessi (eds.). Chapman and 
Hall, London, 1997. 

[17] P. Hubert. "The segmentation procedure as a tool for discrete modeling of hydrometeorogical 
regimes". Stoch. Env. Res. and Risk Ass., vol. 14, pp. 297-304, 2000. 

[18] B.H. Juang. "Maximum likelihood estimation for mixture multivariate stochastic observations of 
Markov chains". ATT Tech. J., vol.64, pp.1235-1249, 1985. 

[19] B.H. Juang and L.R. Rabiner. "Mixture autoregressive hidden Markov models for speech signals". 
IEEE Trans, on Acoustics, Speech and Signal Processing, vol. 33, pp. 1404-1412, 1985. 

[20] Ath. Kehagias, A. Nicolaou, V. Petridis and P. Fragou. "Some dynamic programming algorithms 
for time series segmentation" . In preparation. 

[21] G. Kiely, J.D. Albertson and M.B. Parlange. "Recent trends in diurnal variation of precipitation 
at Valentia on the west coast of Ireland". J. HydroL, vol.207, pp. 270-279, 1998. 

[22] A. Krogh et al. "Hidden Markov models in computational biology: applications to protein model- 
ing". J. Mol. Biol., vol.235, pp.150-1531, 1994. 

[23] H.M. Krolzig. Markov Switching Vector Autoregressions. Springer, 1997. 

[24] S.E. Levinson, L. R. Rabiner and M.M. Sondhi. "An introduction to the application of the theory 
of probabilistic functions of a Markov chain", The Bell Sys. Tech. J., vol. 62, pp.1035-1074, 1983. 



21 



[25] Z.Q. Lu and L.M. Berliner. "Markov switching time series models with application to a daily 
runoff series". Water Resour. Res., vol. 35, pp. 523-534, 1999. 

[26] M.E. Mann, R.S. Bradley and M.K. Hughes. "Northern hemisphere temperatures during the past 
millennium: inferences, uncertainties, and limitations". Geophys. Res. Lett., vol.26, pp. 759- 762, 
1999. 

[27] L. Perreault, M. Hache, M. Slivitzky and B. Bobee. "Detection of changes in precipitation and 
runoff over eastern Canada and U.S. using a Bayesian approach". Stock. Env. Res. and Risk Ass., 
vol. 13, pp.201-216, 1999. 

[28] L. Perreault, E. Parent, J. Bernier, B. Bobee and M. Slivitzky. "Retrospective multivariate 
Bayesian change-point analysis: a simultaneous single change in the mean of several hydrologi- 
cal sequences". Stoch. Env. Res. and Risk Ass., vol. 14, pp. 243-261, 2000. 

[29] L. Perreault, J. Bernier, B. Bobee and E. Parent. "Bayesian change-point analysis in hydrom- 
eteorological time series. Part 1. The normal model revisited". J. HydroL, vol. 235, pp. 221-241, 
2000. 

[30] L. Perreault, J. Bernier, B. Bobee and E. Parent. "Bayesian change-point analysis in hydromete- 
orological time series. Part 2. Comparison of change-point models and forecasting". J. HydroL, vol. 
235, pp.242-263, 2000. 

[31] L.R. Rabiner. "A tutorial on hidden Markov models and selected applications in speech recogni- 
tion", Proc. IEEE, vol. 77, pp.257-286, 1988. 

[32] A. Ramachandra Rao and W. Tirtotjondro. "Investigation of changes in characteristics of hy- 
drological time series by Bayesian methods". Stoch. HydroL and Hydraulics, vol. 10, pp. 295-317, 
1996. 

[33] M. Scheffe. The Analysis of Variance. Wiley, New York, 1959. 

[34] E. Servat et al. "Climatic variability in humid Africa along the Gulf of Guinea. Part I: detailed 
analysis of the phenomenon in Cote d' Ivoire". J. HydroL, vol.191, pp. 1-15, 1997. 



22 



xt 



zt 



Figure 1. A diagrammatic representation of a hidden Markov model. 
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Figure 2. Plot of the Senegal river annual discharge and the segment means. This figure was 
obtained from the optimal 5-th order segmentation 
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Figure 3. Plot of the annual mean global temperature and the segment means. This 
figure corresponds to the optimal fourth order segmentation. 




Figure 4. Plot of the annual mean global temperature and the AR estimate. This figure corrsponds 
to the optimal fourth order segmentation. 



